A simulation study to compare robust clustering methods based on mixtures
نویسندگان
چکیده
The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform “noise”: an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as “noise component” to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578–588, 1998), a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch “noise” (RIMLE; Hennig in Ann Stat 32(4): 1313–1340, 2004), and MLEs for mixtures of t-distributions with and without estimation of the degrees of freedom (McLachlan and Peel in Stat Comput 10(4):339–348, 2000). The RIMLE (using a method to choose the fixed constant first proposed in Coretto, The noise component in model-based clustering. Ph.D thesis, Department of Statistical Science, University College London, 2008) is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended.
منابع مشابه
A robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملExtraction of Respiratory Signal Based on Image Clustering and Intensity Parameters at Radiotherapy with External Beam: A Comparative Study
Background: Since tumors located in thorax region of body mainly move due to respiration, in the modern radiotherapy, there have been many attempts such as; external markers, strain gage and spirometer represent for monitoring patients’ breathing signal. With the advent of fluoroscopy technique, indirect methods were proposed as an alternative approach to extract patients’ breathing signals...
متن کاملThe Noise Component in Model-based Clustering
Model-based cluster analysis is a statistical tool used to investigate groupstructures in data. Finite mixtures of Gaussian distributions are a popular device used to model elliptical shaped clusters. Estim ation of mixtures of Gaussians is usually based on the maximum likelihood method. However, for a wide class of finite mixtures, including Gaussians, maximum likelihood estimates are not robu...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملPrediction of true critical temperature and pressure of binary hydrocarbon mixtures: A Comparison between the artificial neural networks and the support vector machine
Two main objectives have been considered in this paper: providing a good model to predict the critical temperature and pressure of binary hydrocarbon mixtures, and comparing the efficiency of the artificial neural network algorithms and the support vector regression as two commonly used soft computing methods. In order to have a fair comparison and to achieve the highest efficiency, a comprehen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Adv. Data Analysis and Classification
دوره 4 شماره
صفحات -
تاریخ انتشار 2010